feat: implement DuckDB filesystem integration for Vortex file handling#6198
feat: implement DuckDB filesystem integration for Vortex file handling#61980ax1 merged 5 commits intovortex-data:developfrom
Conversation
Merging this PR will not alter performance
Comparing Footnotes
|
0ax1
left a comment
There was a problem hiding this comment.
Thanks for taking the time to look into this. There's a couple of issues around thread safety / locking, error handling and memory leaks.
If you used AI to assist in writing the code for your PR please mention this in your PR description (see: https://github.com/vortex-data/vortex/blob/develop/CONTRIBUTING.md).
0ax1
left a comment
There was a problem hiding this comment.
Almost there, thanks for taking the time to address the comments!
66fedc0 to
b6e15a3
Compare
Hi @0ax1 , any other comments? |
Currently swamped with other work but will get back to it when I find a free moment. |
|
@iceboundrock - quick question on this. Do you know if it's possible to grab the config back out from DuckDB's filesystems? I don't want to block this PR, don't worry(!), but we do have plans to implement a high performance I/O layer at some point and it would probably make more sense to try and configure our own I/O using duckdb configs, rather than attempt to optimize calling back into DuckDB's I/O layer. |
Could you please elaborate on what We probably can do the same as the httpfs extension does. Code link: |
Signed-off-by: Ruoshi <me@ruoshi.li>
…chemas. Signed-off-by: Ruoshi <me@ruoshi.li>
20ea783 to
b2db9de
Compare
Let's handle this in a follow up to limit the scope of this PR. |
0ax1
left a comment
There was a problem hiding this comment.
PR looks great and CI (including all benchmarks) pass.
Last bit of fine-tuning and this is ready to land.
…ated function signatures Signed-off-by: Ruoshi Li <iceboundrock@gmail.com>
…ety and update write/flush methods to use spawn_blocking Signed-off-by: Ruoshi Li <iceboundrock@gmail.com>
There was a problem hiding this comment.
Thank you for the contribution! @iceboundrock 🎉
Will follow up on that! |
Sorry, would you mind letting me know what are And I think it may be because the new version needs to install & load |
|
Oh that's an interesting point. Definitely worth us pre-loading the extension, although it ships with core DuckDB so there wouldn't be any download. Coalescing configs determine how to combine small individual requests to S3 into fewer larger requests to avoid the overhead of each request. We have made a PR to match them up to the old ones for one. The odd thing for me is that the benchmarks didn't regress deterministically, they bounce up and down. |
The install and load was addressed here: #6565. And the coalescing was fixed here: #6555. Must be sth else in the mix why the S3 perf regressed. |
Thanks @gatesn . Based on my experience, even DuckDB doesn’t download the extensions, it still adds 2~4 seconds to my AWS Lambda (ARM 64-bit and 512MB memory) cold start time, just to install and load the |


Uh oh!
There was an error while loading. Please reload this page.